Performance evaluation of ML models for preoperative prediction of HER2-low BC based on CE-CBBCT radiomic features: A prospective study

To explore the value of machine learning (ML) models based on contrast-enhanced cone-beam breast computed tomography (CE-CBBCT) radiomics features for the preoperative prediction of human epidermal growth factor receptor 2 (HER2)-low expression breast cancer (BC). Fifty-six patients with HER2-negative invasive BC who underwent preoperative CE-CBBCT were prospectively analyzed. Patients were randomly divided into training and validation cohorts at approximately 7:3. A total of 1046 quantitative radiomic features were extracted from CE-CBBCT images and normalized using z-scores. The Pearson correlation coefficient and recursive feature elimination were used to identify the optimal features. Six ML models were constructed based on the selected features: linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT). To evaluate the performance of these models, receiver operating characteristic curves and area under the curve (AUC) were used. Seven features were selected as the optimal features for constructing the ML models. In the training cohort, the AUC values for SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively. In the validation cohort, the AUC values for the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively. Among all ML models, the LDA and LR models demonstrated the best performance. The DeLong test showed that there were no significant differences among the receiver operating characteristic curves in all ML models in the training cohort (P > .05); however, in the validation cohort, the DeLong test showed that the differences between the AUCs of LDA and RF, AB, and DT were statistically significant (P = .037, .003, .046). The AUCs of LR and RF, AB, and DT were statistically significant (P = .023, .005, .030). Nevertheless, no statistically significant differences were observed when compared to the other ML models. ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2-low BC and could potentially serve as an effective tool to assist in precise and personalized targeted therapy.


Introduction
Breast cancer (BC) is a heterogeneous disease that varies among molecular subtypes.These subtypes differ in terms of frequency, prognosis, response to treatment, and survival outcomes. [1][4] This categorization includes a wide range of cases, including triple-negative and luminal-type hormone receptor (HR)-positive breast tumors.These patients are classified as HER2-negative BC on the current HER2 validation guidelines, which means that HER2-targeted therapies are not recommended or effective in these patients. [5]Trastuzumab deruxtecan, a novel antibody-drug conjugate, has shown promise in treating patients with HER2-low metastatic BC according to the results of the DESTINY-Breast04 study. [6]According to recent clinical research, patients with HER2-low BC have unique clinicopathological traits, with varying prognoses and rates of pathological complete response compared to HER2-zero (immunohistochemistry [IHC] score 0) BC. [7][8][9] Therefore, accurately distinguishing patients with HER2-low BC from HER2negative BC is crucial for selecting the most appropriate therapy and predicting the treatment outcomes in this population.
In clinical practice, HER2 status can be determined based on IHC and/or in situ hybridization; however, owing to the heterogeneity of HER2 expression, approximately 15.3% and 7.3% of HER2-low BCs are mistakenly classified as HER2-negative and HER2-positive by core needle biopsy, respectively. [10]Core needle biopsy is an invasive procedure.Thus, the development of noninvasive methods that can accurately and quickly determine HER2 status is imperative.
13] It provides high spatial-resolution 3D images of the breast, allows for rapid data acquisition, and offers a comprehensive display of lesion morphology, calcification, and enhancement.CE-CBBCT has shown promising results in lesion identification and differentiation, [14,15] molecular subtyping, [16,17] assessment of disease extent, [12,18] and prediction of axillary lymph node metastasis. [19,20]adiomics based on radiology medical images is a technique used in medical image analysis to uncover hidden factors within image pixels.This method involves manual or automatic collection of quantitative parameters from medical imaging, which may be associated with genotypic and molecular characteristics. [21,22]The noninvasive nature of radiomics allows for the analysis and tracking of lesion distribution over time, eliminating the need for repeat biopsies.Machine learning (ML) algorithms play a crucial role in identifying and detecting significant radiomic features that support outcome predictions.By constructing predictive models using various ML algorithms and comparing their performance, the most effective predictive models can be extracted. [23,24]revious studies [25,26] have demonstrated that ML models based on magnetic resonance imaging (MRI) radiomic features can predict HER2-low BC; however, whether CE-CBBCT radiomic features can discriminate HER2-low from HER2-negative BCs is uncertain.Therefore, this study aimed to develop and validate ML models based on CE-CBBCT radiomics features to predict HER2-low BC.In addition, we evaluated the effectiveness of various ML methods for determining the best prediction model.

Study participants
This prospective study was approved by the Institutional Ethics Board and each patient provided written informed consent.
The inclusion criteria were as follows: patients with suspicious BC (detected by mammography or ultrasound) who underwent CE-CBBCT examinations at Guangxi Medical University Cancer Hospital (Nanning, China) between Aug 2022 and Jun 2023 and CE-CBBCT examinations completed 2 weeks before surgery. [25]The exclusion criteria were preoperative interventions and therapies, poor image quality with noticeable artifacts, tumor diameter ≤ 1 cm, ductal carcinoma in situ and HER2 positive BC, equivocal FISH results, and incomplete clinical and pathological data.A total of 137 patients were enrolled in this study; 81 patients were excluded.Finally, 56 HER2-negative invasive BCs patients were enrolled, including 33 HER2-low and 23 HER2-zero cases.Patients were randomly assigned to the training and validation cohorts at a ratio of approximately 7:3. Figure 1 shows the patient flowchart of the study.

CBBCT protocol
The CBBCT system has been certified for breast imaging diagnostics by the National Food and Drug Administration of China and the USA specialized flat-panel detector breast CT scanner (Koning Breast CT, CBCT 1000, Koning Corporation, KBCT1000, Tianjin Corning Medical Equipment Co., Ltd.,Tianjin, PR China) was used for CBBCT in all the patients.The patients were positioned prone on the examination table and underwent breast scanning without compression.As previously reported, [14,15,27] CE-CBBCT scans were performed with a constant tube voltage of 49 kVp [27] and the tube current was automatically adjusted (between 50 and 160 mA) based on the size and density of the glandular tissues.A complete CE-CBBCT scan consists of 2 parts: non-contrast-enhanced and postcontrast-enhanced.Postcontrast scans were performed twice, 60 seconds (phase 1) and 110 seconds (phase 2) after the contrast medium was administered, as described in previous studies. [14,15]For the CE-CBBCT scans, a nonionic iodinated contrast material (iohexol, 350 mg/mL; Yangtze River Pharmaceutical Co., Ltd., Jiangsu, PR China) was intravenously administered at a flow rate of 2 mL/s and a dose of 1.5 to 2.0 mL/kg with a dual-chamber power injector (MEDRAD Stellant D-CE, Bayer Medical Care Inc 1 Bayer Drive Indianola, PA 15051-07B0 USA), followed by a 30 mL bolus injection of saline solution at the same rate of 2.0 mL/s.

Image analysis
The imaging characteristics of the patients were evaluated by 2 radiologists with 15 (read 1) and 5 (read 2) years of expertise in breast imaging.During evaluation, both doctors were unaware of the patients' pathological findings.The evaluation and reporting of CBBCT characteristics followed the recommendations specified in the American College of Radiology Breast Imaging Reporting and Data System atlas and lexicon, specifically the MRI and mammography sections.Two physicians arrived at a consensus regarding the interpretation of the following CBBCT features.

Clinicopathological data
To collect clinical and pathological data, a thorough evaluation of the patients' medical records and pathological findings from the resected specimens was conducted.The expression of estrogen receptor, progesterone receptor, and HER2 was evaluated using avidin-biotin IHC following the guidelines established by the American Society of Clinical Oncology/College of American Pathologists. [5]Positive estrogen receptor/progesterone receptor expression was defined as the presence of staining in >1% of tumor cell nuclei.A cutoff value of 20% was used to assess the Ki-67 index. [25]HER2 status was confirmed postoperatively using IHC or FISH.Further FISH validation was performed for tumors with equivocal 2+ results.Based on FISH scores, patients were categorized into 4 groups: HER2-positive if a HER2 staining intensity score of 3+ or 2+ plus positive FISH was observed; HER2-negative if a score of 0, 1+, or 2+ plus negative FISH was obtained; HER2-low if a score of 1+ or 2+ plus negative FISH was obtained; and HER2-zero if a score of 0 was obtained.

Tumor segmentation
All DICOM format images from the first phase of CE-CBBCT were transferred to ITK-SNAP software (version 3.6.0;http:// www.itksnap.org)for 3-dimensional (3D) segmentation of the region of interest (ROI).The radiologist (read 1) manually delineated the ROIs on all slices, ensuring accurate delineation of the tumor boundaries.To minimize the influence of the partial-volume effect, the selected ROIs were slightly smaller than those observed by the naked eye. [28]To assess the stability of the features, 1 month later, both read 1 and read 2 independently performed segmentation on a randomly selected sample of 30 cases from the entire study set.The consistency and reproducibility of the features were evaluated using intraclass correlation coefficients (ICC).ICC values above 0.75 were considered indicative of good consistency. [26,29]

Radiomics feature extraction
[31] Wavelet imaging filters and Laplacian of Gaussian (LoG) filtering were used to process the original images and generate supplementary images to enhance the abundance of features.From the original, LoG, and wavelet images, 1046 quantitative radiomics features were extracted using PyRadiomics (https://github.com/Radiomics/pyradiomics). [32]hese included 18 first-order statistical features, 14 shapebased features (3D), 68 texture features, 688 wavelet features, and 258 LoG features.Radiomic features were extracted and standardized following the Imaging Biomarker Standardization Initiative. [33]Detailed information on the formulae for calculating the radiomics features can be found at https://pyradiomics. readthedocs.io.

Feature selection and model building
To ensure reliable classification results and avoid any adverse impact on classifier performance due to sample imbalance, the synthetic minority oversampling technique [34,35] was used to balance the datasets.Before reducing the dimensions, Z-score normalization was used to standardize the radiomic features and ensure a uniform scale of the feature values.To reduce the dimensionality of all radiomic features, Pearson correlation coefficient (PCC) [26] was used to identify highly correlated feature pairs.If the absolute PCC was ≥0.99, the feature with superior performance was retained, indicating a strong correlation between the 2 features.Subsequently, the recursive feature elimination (RFE) [35] method was used to obtain an optimal subset of features considering their redundancy and distinguishing power.The objective of RFE is to select features iteratively by reducing the feature set size at each iteration.Initially, the estimator is trained on the full set of features and the importance of each feature is determined using specific attributes (such as coef_ feature_importances_) or a callable function.Subsequently, the least important features were removed from the current set, and this process was repeated recursively on the pruned set until the desired number of features was achieved (more information can be found at https://scikit-learn.org/stable/modules/feature_selection.html#recursive-feature-elimination).
Six ML models, namely linear discriminant analysis (LDA), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost (AB), and decision tree (DT) (the details of the ML algorithms are described in the Supplementary Material, Supplemental Digital Content, http://links.lww.com/MD/M867 or can be found at https:// scikit-learn.org/stable/user_guide.html#) were constructed using the optimal feature subsets selected by dimensionality reduction.An independent validation cohort was used to assess the predictive performance of the models, using the same thresholds as those determined in the training cohort.The performance was evaluated using a 10-fold cross-validation method, ensuring that the classes were roughly balanced in each fold.
To evaluate the effectiveness of the aforementioned models, receiver operating characteristic (ROC) curves were drawn, and the area under the curve (AUC), accuracy, Matthews correlation coefficient, specificity, sensitivity, positive predictive value, and negative predictive value were calculated using FeAture Explorer Pro (FAE; v0.5.6) in Python v3.7.6.The FAE code can be found at https://github.com/salan668/FAE. [36]Figure 2 shows the radiomics workflow.

Statistical analysis
The homogeneity and normality of the continuous variables were assessed using the Shapiro-Wilk and F tests.Continuous variables with normal distributions are expressed as mean ± SD, whereas skewed distributed variables are presented as medians (lower quartile, upper quartile).The classification data are expressed as the relative distribution frequency and percentage.Independent-sample T-tests and Mann-Whitney U tests were used to compare normal and skewed distributions, respectively.Categorical variables were compared using the chi-square and Fisher exact tests.The Delong test was used to compare the ROC curves between the models.All statistical analyses were 2-sided, and P < .05 was considered statistically significant.All statistical analyses were performed using SPSS 26.0 (IBM Corp., Armonk).

Clinicopathological and CE-CBBCT characteristics
All patients were women with a mean age of 49.71 ± 10.45 years (range: 31-80 years).The patients were divided into training (n = 39) and validation (n = 17) cohorts.In the training cohort, 23 (59.0%) patients were HER2-low and 16 (41.0%)were HER2-zero; in the validation cohort, 10 (58.8%) were HER2-low and 7 (41.2%)were HER2-zero.The clinicopathological characteristics of the 2 cohorts were compared, and no significant differences were found (P > .05),as shown in (Table 1).Similarly, no significant differences in the CBBCT findings were observed between the training and validation cohorts (Table 2).

Prediction performance of ML models
As shown in Table 4, in the training cohort, the AUC of the SVM, LDA, RF, LR, AB, and DT were 0.984, 0.981, 1.000, 0.970, 1.000, and 1.000, respectively.In the validation cohort, the AUC of the SVM, LDA, RF, LR, AB, and DT were 0.859, 0.880, 0.781, 0.880, 0.750, and 0.713, respectively.Figure 3 shows the ROC curves of the 6 ML models.The DeLong test showed that there were no significant differences among the ROCs of all ML models in the training cohort (P > .05);However, in the validation cohort, the results of the DeLong test showed that the difference between the AUC of LDA and RF, AB, and DT was statistically significant (P = .037,.003,.046),while the DeLong test showed that the difference between the AUC of LR and RF, AB, and DT was statistically significant (P = .023,.005,.030).However, when compared with other ML models, there was no statistically significant difference.The results are summarized in Table 5.

Discussion
Our study demonstrated that certain radiomic features derived from CE-CBBCT can be used to predict HER2-low BC.The LDA and LR classifiers achieved the highest AUC (0.880) and demonstrated a superior performance in the validation cohort.However, the LDA classifier exhibited higher accuracy (0.872) than the LR classifier (0.846).In contrast, the DT model showed a lower AUC value (0.713), indicating a tendency toward overfitting.ML models based on CE-CBBCT radiomics have the potential to effectively predict HER2-low BC in a noninvasive manner.These models could be valuable for supporting clinical diagnoses and guiding treatment decisions.
To the best of our knowledge, this is the first study to investigate the correlation between radiomic features extracted from CE-CBBCT images and the HER2-low BC status.Bian et al [25] conducted a study in which they utilized intratumoral and peritumoral radiomic characteristics from multiparametric MRI to differentiate HER2-low from HER2-zero breast tumors, achieving AUC of 0.776 and 0.71 in internal and external validation.Similarly, another study [26] found that radiomic features extracted from multiparametric MRI were capable of differentiating HER2-low from HER2-zero BCs, achieving an AUC of 0.79.However, our study surpassed these previous findings in terms of superior predictive performance, achieving an AUC of 0.880.This disparity may be attributed to the fact that CE-CBBCT can simultaneously exhibit abnormal enhancement and calcifications, which is a remarkable advantage that MRI cannot achieve. [12]ompared to HER2-zero, HER2-low BCs show a higher incidence of HR positivity and mutations in the PI3K/AKT/mTOR pathway, which have been identified as the main factors contributing to trastuzumab resistance. [7,9]The clinicopathological and molecular characteristics of these 2 subtypes can be observed macroscopically in CE-CBBCT, whereas subtle variations can be quantified using high-throughput radiomics features.The HR status and PI3K/AKT/mTOR pathway may form the biological basis of this radiomic model.However, this hypothesis should be confirmed by future radiogenomic studies.
The generalizability of ML models benefits from feature selection, which aims to extract the most informative features while eliminating noisy, irrelevant, and redundant ones. [37]The 3 main methods for feature selection are filtering, wrappers, and embedding methods. [38]In this study, we employed PCC [39] and RFE [37] for feature selection and reduction.PCC is a widely used filter method that ranks features based on their correlation with a class, removing those below a certain threshold.Although less computationally intensive, filter methods may lead to models with reduced predictive performance compared to wrapper or embedded methods.RFE is [37] a wrapper method that identifies features with minimal impact on performance when removed.Unlike filter methods, wrapper methods utilize the classifier's performance as a metric for selecting the best feature subset, resulting in higher predictive performance.However, wrapper methods are influenced by the choice of the classifier.In our study, ML models were developed by incorporating 7 optimal radiomic features including 2 LoG and 5 wavelet features.Both wavelet and LoG features highlight variations in gray-level regions within the tumor, capturing tumor heterogeneity more effectively than the simpler texture features.This can be attributed to the fact that higher-order filtering obtained through image transformation enhances the visibility of the subtle details of the tumor texture.Although the exact biological characteristics captured by these 2 features cannot be determined, it is hypothesized that these subtle details may be associated with slight changes in microvessel density and the tumor microenvironment. [20]he current standardized techniques for evaluating HER2 expression are IHC and FISH, although disagreement has been reported between the central and local assessments of HER2-zero and HER2-low scores. [40]Typically, this assessment is based on a single biopsy of a potentially heterogeneous tumor, providing only a local snapshot of its biological characteristics. [35]By incorporating radiomic signatures as diagnostic tools into their workflow, pathologists can enhance their awareness of potential HER2 expression in tumors.Therefore, the radiomic signature could serve as a noninvasive biomarker for evaluating whole-tumor heterogeneity and assist in identifying the most appropriate target for biopsy to maximize the possibility of detecting HER2 expression in cases of multifocal involvement.Furthermore, HER2 expression can be repeatedly assessed to monitor changes in the spatiotemporal biology of tumors as the disease progresses.Considering the limited treatment options available for resistant luminal and triple-negative subpopulations at advanced stages, retesting for low HER2 expression using radiomic guidance may be an important option for recruiting patients for ongoing clinical trials of anti-HER2 therapies.
ML is a computational approach that leverages data to enhance performance and make accurate predictions.In this study, we used 6 supervised classification algorithms: LR, LDA, SVM, DT, RF, and AB.They are commonly used to classify breast lesions or differentiate between the molecular subtypes of BC. [41] LR and LDA aim to identify linear relationships between input data and outcome variables, providing simple and interpretable results. [42]The SVM operates by transforming data to enable the separation of classes through a hyperplane, which is represented as a line in a 2-dimensional space.Support vectors located closest to this hyperplane play a crucial role in the classification. [43]The RF algorithm constructs multiple DTs with varying depths by using random subsets of data and features at each split to create uncorrelated trees.The final prediction was the average of all trees, resulting in an unbiased model. [43]DT is specifically used for early BC detection and organizes the dataset into smaller subsets for classification. [44]AB addresses these challenges by combining "weak" models to form a diverse and accurate model with strong generalization capabilities. [42,45]o date, scholars have applied ML algorithms combined with clinical or imaging data to build predictive models to further improve the accuracy. [24,35]To obtain the best prediction model, some scholars have compared the prediction performances of different ML algorithms. [24,35,46]Zhang [35] developed 5 ML models based on the radiomic features of dynamic contrast-enhanced MRI and non-mono-exponential modelbased diffusion-weighted imaging to predict HER2+ and HER2− status, including LR, RF, AB, SVM, and LDA.Their findings revealed that both the RF and AB models exhibited   superior predictive abilities compared to the other 3 models.Similarly, Sheng et al [46] found that the RF model outperformed other ML models by extracting radiomic features from dynamic contrast-enhanced-MRI to predict HER2-overexpressed and non-HER2-overexpressed BC.
In contrast to previous studies, our study revealed that LDA and LR exhibit superior predictive performance.We hypothesize that this disparity may be due to the fact that previous models did not consider the HER2-low category, despite this category representing the majority of BCs and being the therapeutic target of the new antibody-drug conjugates.

Limitations
First, this was a prospective study with a relatively small sample size; a larger sample size is required for further studies.Second, we used an immunohistochemical surrogate as a prognostic biomarker for BC rather than wholegene sequencing.Third, the ROI segmentation in this study was performed manually, which may introduce potential errors due to human involvement.Finally, it is unclear whether the current predictive models can be applied to different imaging protocols and machines in other institutions.

Conclusion
ML models based on CE-CBBCT radiomics features achieved excellent performance in the preoperative prediction of HER2low BC, and LDA and LR showed superior predictive capabilities, suggesting that they could be valuable tools for guiding precise and personalized targeted therapies.These are preliminary results, and a larger sample size is required to validate our conclusions.

Figure 1 .
Figure 1.Patient enrollment flowchart with inclusion and exclusion criteria.

Figure 3 .
Figure 3. Receiver operating characteristic curves of machine learning model for predicting HER2 low breast cancer in the training and validation cohort.HER2 = human epidermal growth factor receptor 2.

Table 1
Patient characteristics in the training and validation cohort.

Table 2
CBBCT findings of patients in the training and validation cohorts.

Table 3
Intra-and inter-correlation coefficients for optimal selected features.

Table 4
Evaluation indicators of the predictive performance of 6 models for predicting HER2-low.

Table 5
DeLong test results for each receiver operating characteristic curve for the validation cohort.AdaBoost, DT = decision tree, LDA = linear discriminant analysis, LR = logistic regression, RF = random forest, SVM = support vector machine.